Stochastic Recognition of Human Physical Activities via Augmented Feature Descriptors and Random Forest Model

Human physical activity recognition from inertial sensors is shown to be a successful approach for monitoring elderly individuals and children in indoor and outdoor environments. As a result, researchers have shown significant interest in developing state-of-the-art machine learning methods capable of utilizing inertial sensor data and providing key decision support in different scenarios. This paper analyzes data-driven techniques for recognizing human daily living activities. Therefore, to improve the recognition and classification of human physical activities (for example, walking, drinking, and running), we introduced a model that integrates data preprocessing methods (such as denoising) along with major domain features (such as time, frequency, wavelet, and time–frequency features). Following that, stochastic gradient descent (SGD) is used to improve the performance of the extracted features. The selected features are catered to the random forest classifier to detect and monitor human physical activities. Additionally, the proposed HPAR system was evaluated on five benchmark datasets, namely the IM-WSHA, PAMAP-2, UCI HAR, MobiAct, and MOTIONSENSE databases. The experimental results show that the HPAR system outperformed the present state-of-the-art methods with recognition rates of 90.18%, 91.25%, 91.83%, 90.46%, and 92.16% from the IM-WSHA, PAMAP-2, UCI HAR, MobiAct, and MOTIONSENSE datasets, respectively. The proposed HPAR model has potential applications in healthcare, gaming, smart homes, security, and surveillance.


Introduction
Human physical activity recognition (HPAR) is a subject of research that focuses on developing and experimenting with novel techniques for automatically recognizing activities via signals acquired by wearable or ambient sensors [1]. However, for the most part, ambient sensors require installation in a household environment, and appliances such as camera systems are seen as obtrusive, specifically by aging people [2]. For such reasons, the emphasis has turned to the employment of wearable sensors in recent years. Fitness trackers, smartphones, and inertial sensors are currently receiving adequate attention [3][4][5][6]. This is mainly due to the widespread use of gadgets by the general public and the incorporation

•
The augmentation of discriminative features from various domains makes the proposed human physical activity recognition model robust in the presence of noisy data. It maintains locally dependent characteristics of the random forest algorithm, providing a novel approach for improving recognition performances across all five benchmark datasets. • A hybrid feature descriptor model with random forest is proposed to cope with the convoluted patterns of human physical motion activities with improved classification accuracies in all datasets.

•
The complex behavior transition, especially in our self-annotated dataset IM-WSHA, requires more time to recognize activities. Therefore, we utilized a higher window size so that our model could work with a minimal number of changes. We also created a self-annotated dataset named Intelligent Media Wearable Smart Home Activities (IM-WSHA), comprising 11 (static and dynamic) daily life log activities, along with divergences in gender, weight, height, and age. • Additionally, a comprehensive analysis was performed for human physical activities on five public benchmark datasets: IM-WSHA, PAMAP-2, UCI HAR, MobiAct, and MOTIONSENSE. Experimental results reveal an improved recognition rate, which also outperforms other state-of-the-art systems.
The remainder of this study is structured as follows: Section 2 provides a detailed overview of the literature concerning human physical activity analyses. Section 3 addresses the proposed framework of our HPAR model. Section 4 analyzes the five benchmark datasets along with the detailed experimental results. Finally, Section 5 presents the paper's conclusion and future research perspectives.

Related Work
There are two standard ways to analyze HPAR: vision sensor-based HPAR and wearable sensor-based HPAR. Various characteristics and insights may be drawn from this analysis, including the acquired image and signals, extracted feature descriptors, and methods utilized for dimensionality reduction and human activity classification. This section summarizes previous research on human physical activity recognition (HPAR) analyses via vision sensors and wearable sensors.

HPAR via Vision Sensors
Vision-based HPAR relies entirely on visual sensing technologies, including surveillance and video cameras, image sequences, video sequences, modeling, segmentation, detection, and tracking. Liu et al. [16] presented a human activity recognition system incorporating a non-linear support vector machine (SVM) to recognize twenty distinct human activities via an accelerometer and RGBD camera sensor data. Their experimental results indicate that their proposed method is significantly more robust and effective than the baseline method at recognizing activities. However, the main constraint of this study involved the performance of unusual classes, particularly transition activity classes. Additionally, they intended to improve the performance by incorporating this class imbalance issue into their classification model. Yang et al. [17] developed a novel model for identifying human activities from a video series recorded by depth-based cameras. Additionally, they discussed the low-level polynomial designed from a nearby local hyperspace. Furthermore, their proposed system is adaptable, i.e., it could be used in cooperation with the joint trajectory-matched depth sequence. Their proposed model was comprehensively analyzed and tested on five benchmark datasets. The experimental outcomes reveal that their proposed strategy outperformed existing methods on these datasets by a significant margin. However, their proposed system lacked the utilization of complementary information along with the integration of various features from both color and depth channels in order to create more state-of-the-art representations. Sharif et al. [18] proposed a hybrid technique for efficiently classifying daily human activities from an acquired video frame. In addition, their proposed system involves two significant steps. Initially, various subjects were detected in the acquired video frame via a combination of a new uniform and EM segmentation. Then, utilizing vector dimensions, they extracted local properties from specified sequences and combined them. Additionally, a new Euclidean distance along with joint entropy was exploited to pick the optimal features from the augmented vector. The optimal feature descriptors were catered to the classifiers for human activity recognition. However, occlusions were not addressed in this work. Another possibility is to incorporate saliency to maximize segmentation accuracy. In [19], Patel et al. proposed a method for detecting and recognizing the daily living activities of humans. Additionally, they explored various human visual databases to detect and monitor multiple human subjects. The background subtraction method was utilized to monitor the different persons in motion. In comparison, human daily living activities via the HOG feature extraction and an SVM classifier generate better recognition results with fewer false detections. Ji et al. [20] introduced a unique approach for interactive behavior recognition based on different stage probability fusions. Additionally, they dealt with the present issues in the interaction classification algorithms, including inadequate feature descriptors resulting from improper human body segmentation. Therefore, a multi-stage-based fusion strategy was presented to deal with this issue. However, this technique is ineffective at addressing the intrinsic characteristics of human behavior; instead, it is useful for categorizing abnormal behaviors, such as violent acts and unusual events. In [21], Wang et al. presented a probabilistic-based graphical framework for human physical activity recognition. Additionally, they addressed the issue of segmenting and recognizing continuous action. However, these methods operate only in offline mode. Ince et al. [22] developed a biometric system framework for detecting human physical activities in a three-dimensional space using skeletal joint angle patterns. Additionally, this framework exploits the RGB-depth camera, which appears suitable for video surveillance and elderly care facilities. However, there are a few drawbacks linked with the model. Initially, improper skeletal detection results in wrong angle estimations, and imprecise classifications.

HPAR via Wearable Sensors
Wearable-based inertial sensors have revolutionized every characteristic of our daily lives, from healthcare to ease and comfort. Therefore, due to the substantial demands for improved processing capacities and reduced size requirements, we analyzed IMU-based systems in this research. Irvine et al. [23] introduced a homogenous ensemble neural network method for identifying daily living activities in an indoor environment. Additionally, four standard models were developed and combined using support function fusion. Furthermore, they tested their proposed framework, the ensemble neural network method, by evaluating the attained HPAR performance with two non-parametric standard classifiers. The ensemble neural network technique outperformed both standard models, revealing the robustness of the proposed ensemble method. However, the work was restricted with no method for determining a relevant subset of input features. In [24], Feng et al. introduced an ensemble technique for recognizing HPAR, utilizing several wearable inertial sensors by integrating an independent random forest algorithm. The improved forecasting capabilities of the random forest resulted in a better option for wearable sensor-based healthcare tracking systems. Gupta et al. [25] presented an effective physical activity recognition system based on a portable wearable accelerometer that can be employed in a real-life application of elderly monitoring. Additionally, they incorporated effective capabilities for recognizing transitional behaviors. The proposed statistical features extracted additional information about the inertial signals in the time-frame window. Furthermore, additional cues are assessed to extract signal correlation. However, the fundamental challenge of this work is that only two individuals were used to acquire information, which limits the database's applicability in various environments. In [26], Abidine et al. developed a weighted support vector machine (SVM) for tracking human life log activities in an indoor environment. Additionally, they addressed various implementation issues with the HAR methods, including redundant sequence characteristics and group variances in the learning set. To address these problems, they presented a novel technique for recognizing life log activities in an indoor environment. Furthermore, the entire model was based on the fusions of different algorithms, including PCA, SVM, and LDA. To begin, the learning set was lessened via the PCA and LDA features. Then, an SVM classifier was used for each class to handle the unbalanced life log activity database to maximize the detection rate. In another study, Cillis et al. [27] proposed a ubiquitous novel solution for locomotion patterns via a wearable-based inertial accelerometer sensor. Additionally, their proposed model utilized a finite feature set along with a decision tree classifier to recognize four distinct human locomotion patterns. Firstly, they acquired features from both individual and dynamic sets of windows. The experimental outcomes indicated that accuracy was better when performing static tasks but much lower when performing dynamic tasks. The model's low processing overhead may make it well-suited for real-time applications in medical care. In [28], Tian et al. presented an ensemble learning approach for recognizing human physical activities. Three state-of-the-art classifiers and multiple SVMs were trained by numerous features, resulting in an ensemble learning-based system. Additionally, an adaptive hybrid model extracted various features from human physical activities to improve their recognition rate. Jing et al. [29] developed a HAR-based system for tracking life log daily activities along with fall detection by using various wearable inertial sensors. Javed et al. [30] presented a state-of-the-art technique to recognize human physical activities via sensory data acquired from a two-axis smartphone accelerometer. In addition, this study also determined the efficacy and impact of the individual accelerometer axis in classifying human physical activities. Furthermore, this technique incorporates multi-modal sensory data acquired from three body-worn sensors. This study demonstrates that the augmentation of inertial sensor data improves the HAR accuracy. The entire system was compared to a complete activity set comprising cyclic, static, and random actions. Furthermore, time and frequency domain features were extracted to gain optimal results.

Material and Methods
The proposed HPAR system acquired raw signals from five benchmark datasets comprising MEMS inertial sensors. To begin, a preprocessing step was employed to eliminate saw-tooth wave noise caused by abrupt displacement using a third-order median filter. Next, the filtered signal values were organized into time blocks of comparable duration. Secondly, in feature extraction, we proposed an augmented features pool comprising five different features in four domains: time, frequency, wavelet, and time-frequency domain. Additionally, the acquired features were normalized using extreme values to eliminate the possibility of complex values appearing during the final phases of feature selection. Thirdly, a feature selection strategy was adopted for optimizing feature vectors in such a way that the relevant optimal features were retained for further phases of data processing. Finally, the denoised optimal selected features were served to the random forest classifier algorithm, which analyzed the signal stream and trained and tested the model via the optimal feature descriptor set. The proposed architecture of HPAR is presented in Figure 1.

Data Acquisition and Signal Denoising
Feature extraction was highly dependent on the denoising stage, so it was critical to remove all noise from the acquired raw data [31]. The data collected from the sensors comprising the inertial measurement unit and MEMS were seriously vulnerable to interference and noise, resulting in raw signal variances and, consequently, feature loss. As a result, we utilized a median filter for inertial sensor-based benchmark datasets to reduce the related model via the optimal feature descriptor set. The proposed architecture of HPAR is presented in Figure 1.

Data Acquisition and Signal Denoising
Feature extraction was highly dependent on the denoising stage, so it was critical to remove all noise from the acquired raw data [31]. The data collected from the sensors comprising the inertial measurement unit and MEMS were seriously vulnerable to interference and noise, resulting in raw signal variances and, consequently, feature loss. As a result, we utilized a median filter for inertial sensor-based benchmark datasets to reduce the related noise. The denoised and unprocessed signal components of the third-order median filter of the inertial sensor are illustrated in Figure 2.

Feature Extraction
In this phase, we proposed an augmented features model to obtain important feature descriptors to assist the analysis of inertial-based signals. Additionally, it was composed of four different major domains-time, frequency, wavelet, and frequency domain descriptors. The filtered signals were streamed and used to abstract features from the sensor data stream. Furthermore, signal features were retrieved from within the confined region with adequate contextual information.

Statistical Features
The statistical descriptors (Sd) depict the average mean, mode, median, and min/max signal features of the IMU signal. Additionally, these descriptors are important in assessing the aggregate differences that come from each n physical activity.

Data Acquisition and Signal Denoising
Feature extraction was highly dependent on the denoising stage, so it was critical to remove all noise from the acquired raw data [31]. The data collected from the sensors comprising the inertial measurement unit and MEMS were seriously vulnerable to interference and noise, resulting in raw signal variances and, consequently, feature loss. As a result, we utilized a median filter for inertial sensor-based benchmark datasets to reduce the related noise. The denoised and unprocessed signal components of the third-order median filter of the inertial sensor are illustrated in Figure 2.

Feature Extraction
In this phase, we proposed an augmented features model to obtain important feature descriptors to assist the analysis of inertial-based signals. Additionally, it was composed of four different major domains-time, frequency, wavelet, and frequency domain descriptors. The filtered signals were streamed and used to abstract features from the sensor data stream. Furthermore, signal features were retrieved from within the confined region with adequate contextual information.

Statistical Features
The statistical descriptors (Sd) depict the average mean, mode, median, and min/max signal features of the IMU signal. Additionally, these descriptors are important in assessing the aggregate differences that come from each n physical activity.

Feature Extraction
In this phase, we proposed an augmented features model to obtain important feature descriptors to assist the analysis of inertial-based signals. Additionally, it was composed of four different major domains-time, frequency, wavelet, and frequency domain descriptors. The filtered signals were streamed and used to abstract features from the sensor data stream. Furthermore, signal features were retrieved from within the confined region with adequate contextual information.

Statistical Features
The statistical descriptors (S d ) depict the average mean, mode, median, and min/max signal features of the IMU signal. Additionally, these descriptors are important in assessing the aggregate differences that come from each n physical activity.
where n is the framed vector data size, a is the whole number of coefficients in the vector, V depicts the initial vector value, and I represents the average mean of the vector data. Figure 3 showed a three-axis plot augmented with different time domain features of walking activities extracted from the MOTIONSENSE dataset.
where n is the framed vector data size, a is the whole number of coefficients in the vector, V depicts the initial vector value, and I¯ represents the average mean of the vector data. Figure 3 showed a three-axis plot augmented with different time domain features of walking activities extracted from the MOTIONSENSE dataset.

Hilbert-Huang Transform (HHT)
The HHT is believed to be highly effective for dealing with non-linear and stochastic signal data [32]. For instance, data from five benchmark datasets involved different inertial time series data. Additionally, the IMU data from different sensors were generally non-linear. Thus, the Hilbert-Huang transform (HHT) divided the resultant time series of non-linear IMU data into distinctive repeated components called intrinsic mode functions (IMFs) (see Figure 4). The whole method is known as the intrinsic mode decomposition. Additionally, these elements generated distinct frequency bands capable of computing shifts in instantaneous frequencies. Therefore, we could make valid comparisons between the attributes of diverse activities. The acquired processed data can be expressed as: ( 2) where P(s) represents the processed inertial signal, ca indicates the ath IMF, and depicts the whole remainder.

Hilbert-Huang Transform (HHT)
The HHT is believed to be highly effective for dealing with non-linear and stochastic signal data [32]. For instance, data from five benchmark datasets involved different inertial time series data. Additionally, the IMU data from different sensors were generally nonlinear. Thus, the Hilbert-Huang transform (HHT) divided the resultant time series of non-linear IMU data into distinctive repeated components called intrinsic mode functions (IMFs) (see Figure 4). The whole method is known as the intrinsic mode decomposition. Additionally, these elements generated distinct frequency bands capable of computing shifts in instantaneous frequencies. Therefore, we could make valid comparisons between the attributes of diverse activities. The acquired processed data can be expressed as: where P(s) represents the processed inertial signal, c a indicates the ath IMF, and r n depicts the whole remainder.

Haar Wavelet Transform
The Haar wavelet transform (HWT) has evolved as a sophisticated technology in the domain of image and signal analysis. In general, wavelets are mathematical techniques utilized for hierarchically splitting functions [33]. In our HPAR model, the Haar wavelet-based features were used to recognize patterns at specific intervals in order to examine signal variations. In addition, the Haar wavelet transform involves a wavelet-based structure (see Figure 5). Therefore, it is a robust and reliable signal processing technique. HWTs are denoted by their coefficients (a, d), with 'a' depicting approximation coefficients and 'd' representing the approximation coefficients. Moreover, these coefficients facilitate estimating the IMU signal's total power and serve inappropriate restoration and segmentation. The HWT can be expressed as: where the scaling function is expressed as ψ( f ).  Finally, IMF is reduced from the input.

Haar Wavelet Transform
The Haar wavelet transform (HWT) has evolved as a sophisticated technology in the domain of image and signal analysis. In general, wavelets are mathematical techniques utilized for hierarchically splitting functions [33]. In our HPAR model, the Haar waveletbased features were used to recognize patterns at specific intervals in order to examine signal variations. In addition, the Haar wavelet transform involves a wavelet-based structure (see Figure 5). Therefore, it is a robust and reliable signal processing technique. HWTs are denoted by their coefficients (a, d), with 'a' depicting approximation coefficients and 'd' representing the approximation coefficients. Moreover, these coefficients facilitate estimating the IMU signal's total power and serve inappropriate restoration and segmentation. The HWT can be expressed as: where the scaling function is expressed as .

Spectral Entropy
Spectral entropy quantifies the randomness in a model, which contributes to the system's complexity [34]. The system's complexity provides significant information, such as random variations in body activity. These data are utilized to distinguish between various life log activities (see Figure 6). Additionally, they assist in estimating the IMU signal spectral range, which generates a power spectrum involving important information about a particular activity. The following steps were used to acquire the features presented by spectral entropy.  Firstly, the acquired IMU signal's power spectrum was normalized and denoted as Psp(f).
amplitude Figure 5. The 1D-HWT feature of the inertial signal feature plot from daily activity (walking) from the USC-HAD dataset.

Spectral Entropy
Spectral entropy quantifies the randomness in a model, which contributes to the system's complexity [34]. The system's complexity provides significant information, such as random variations in body activity. These data are utilized to distinguish between various life log activities (see Figure 6). Additionally, they assist in estimating the IMU signal spectral range, which generates a power spectrum involving important information about a particular activity. The following steps were used to acquire the features presented by spectral entropy.

•
Firstly, the acquired IMU signal's power spectrum was normalized and denoted as Ps p (f).
• To extract modified elements, we utilized the Shannon function to change the normalized power spectrum.
• In the end, the acquired Q sp ( f ) elements were enveloped.
where SE sp is equivalent to the number of elements in total.
Sensors 2022, 22, x FOR PEER REVIEW 10 of 22 Figure 6. Spectral entropy for the upstairs walking activity signal plot from the MOTIONSENSE dataset. The black signal denotes inertial data and the blue signal represents the spectral entropy of an inertial signal from the MOTIONSENSE dataset.

Wavelet Packet Entropy (WPE)
Wavelet packet entropy is a time-frequency representation technique that is both effective and reliable for inertial signals. Initially, WPE decomposes an inertial signal into many frequency resolutions, each with its own set of information and approximation factors [35]. The two-level decomposition of walking data is presented in Figure 7. Additionally, WPE can be represented as: where h(c) along with g(c) denotes two different filters for the extraction of ACs and DCs, and di,j indicates the restoration of IMU signals at the ith and jth node.

Wavelet Packet Entropy (WPE)
Wavelet packet entropy is a time-frequency representation technique that is both effective and reliable for inertial signals. Initially, WPE decomposes an inertial signal into many frequency resolutions, each with its own set of information and approximation factors [35]. The two-level decomposition of walking data is presented in Figure 7. Additionally, WPE can be represented as: where h(c) along with g(c) denotes two different filters for the extraction of ACs and DCs, and d i,j indicates the restoration of IMU signals at the ith and jth node.
where h(c) along with g(c) denotes two different filters for the extraction of ACs and DCs and di,j indicates the restoration of IMU signals at the ith and jth node.

Feature Selection via Stochastic Gradient Descent (SGD)
In the proposed HPAR model, features from the different domains were optimized using a state-of-the-art gradient algorithm, referred to as a stochastic gradient algorithm. Gradient descent is an important method for discovering the optimal solution with the lowest cost function via a linear function. Initially, gradient descent was utilized to adapt network gradients in neural networks [36]. Additionally, the gradient descent approach may work slower if all the training data are evaluated at each epoch. Furthermore, in some cases, SGD outperforms the other gradient optimizers, such as Adam, in terms of adaptability to new data [37]. The training phase ends when the loss on the validation set exceeds the threshold level. Due to the fact that SGD generates more oscillations throughout the training phase, it requires a more significant number of epochs to converge. Considering the extended training period, SGD has two significant advantages. To begin with, the stochastic technique improves the probability of outperforming local minima solutions [38,39]. Then this lowers the risk of abruptly interrupting the training process by ensuring that the model has been through a sufficient number of epochs [40,41]. Therefore, we present the SGD approach with the minibatch as a non-consumptive optimizer. However, when incorporated with sparse data selection, the minibatch SGD significantly lowers the cost and inconsistency associated with the traditional SGD. Thus, the minibatch involves a comprehensive analysis combined with adaptive learning rates and initial settings to attain the minimum loss function. As a result, the learning settings are adjusted, and the result is attained reliant on the learning rate. Thus, the first learning rate was set to default 0.01, and the average batch size was set to 1000, which may be tuned via regularization parameters. The SGD model for the entire training sets for i(k)and j(k) is as follows: where θ shows the main angle, ∇ θ J θ; i (r) ; j (r) are the main functions, and η signifies the size of the minibatch, and the lowest loss function is denoted by: θ = θ − η·∇ θ J θ; i (r:r+n bs ) ; j (r:r+n bs ) where θ shows the angle and ∇ θ J θ; i (r:r+n bs ) ; j (r:r+n bs ) the updated main function.

Classification
After the feature selection step, we tested our proposed HPAR model from five benchmark datasets, IM-WSHA, PAMAP-2, UCI HAR, MobiAct, and MOTIONSENSE, which were composed of diverse classes of human daily living activities. The optimal feature descriptors of SGD were recognized by a state-of-the-art classifier, random forest (RF), which followed ensemble learning techniques for classification and regression. Additionally, the random forest classifier included a novel variant of bagged trees, which is an optimal method for creating a training test. In our case, bagging acquired samples from all five daily living activities datasets. A model was built for each sample and was utilized to make decision trees. Finally, all decision trees were augmented based on the highest number of votes to deliver the best results. Figure 8 illustrates the overall architecture of the random forest classifier. The classified vectors for the IM-WSHA dataset are shown in Figure 9. We trained a proposed model, fr on Ar, Br.
1 (10) where y′ indicates the predictions for the random samples. It was calculated by averaging the prediction of all decision trees on y′. The total number of samples is represented as R, which is a free parameter.  We trained a proposed model, fr on Ar, Br.

Discussion
1 (10) where y′ indicates the predictions for the random samples. It was calculated by averaging the prediction of all decision trees on y′. The total number of samples is represented as R, which is a free parameter.

Discussion
All experiments and testing were performed using an HP laptop configured with an Intel Core i5-8300H CPU operating at a base frequency of 2.30 GHz, 8GB RAM, and Nvidia GTX 1050Ti dedicated graphics card running Windows 10 Pro 64-bit with Google Colab and MATLAB. Additionally, a model for evaluating the performance of our HPAR system from five benchmark datasets was constructed. Furthermore, we used the leave-one-subject-out (LOSO) cross-validation scheme to assess the recognition performance of our HPAR model in different indoor and outdoor settings. We trained a proposed model, f r on A r , B r . where y indicates the predictions for the random samples. It was calculated by averaging the prediction of all decision trees on y . The total number of samples is represented as R, which is a free parameter.

Discussion
All experiments and testing were performed using an HP laptop configured with an Intel Core i5-8300H CPU operating at a base frequency of 2.30 GHz, 8GB RAM, and Nvidia GTX 1050Ti dedicated graphics card running Windows 10 Pro 64-bit with Google Colab and MATLAB. Additionally, a model for evaluating the performance of our HPAR system from five benchmark datasets was constructed. Furthermore, we used the leave-one-subject-out (LOSO) cross-validation scheme to assess the recognition performance of our HPAR model in different indoor and outdoor settings.

Benchmark Datasets
The first benchmark dataset-the IM-Wearable Smart Home Activities (IM-WSHA) [42] database-contains signal data from five IMU sensors, including three-axis accelerometers, gyroscopes, and magnetometers. Additionally, these IMU sensors were incorporated into three separate bodily regions, the chest, thigh, and wrist, to extract real-time human motion features of daily living activities. Ten individuals (five males and five females) attempted eleven different physical activities in the indoor setting, including walking, exercising, cooking, drinking, phone conversation, ironing, watching TV, reading a book, brushing hair, using the computer, and vacuum-cleaning.
The second benchmark dataset-physical activity monitoring for aging people, also referred to as the PAMAP-2 [43] dataset-is openly accessible via the UCI learning repository. The PAMAP-2 database involved data from three wireless inertial sensors incorporated with three-axis accelerometers, gyros, and magnetometers that were worn on the individual's wrist, chest, and ankle positions during 18 daily physical static and dynamic activities. However, this dataset evaluated twelve living activities, including walking, cycling, lying down, sitting, standing, Nordic walking, running, rope jumping, ironing, house cleaning, and ascending and descending stairs. Furthermore, this database involved recurring daily activities unique to the HPAR model to analyze the sophisticated motion patterns.
The third benchmark dataset-the MOTIONSENSE [44] dataset-is a publicly available open-access database that involves smartphone tri-axial accelerometers and tri-axial gyroscope sensor data. The human subject placed his smartphone in his front pocket. A total of 24 individuals (14 males and 10 females) performed six life log activities in both indoor and outdoor settings (such as walking, sitting, running, standing, ascending, and descending activities).
The fourth benchmark dataset was the Human Activity Recognition database (UCI HAR) [45]. Researchers acquired triaxial linear acceleration and rotational motion data using the cellphone accelerometer sensor at a data rate of 50 Hz. Such data were normalized for denoising with a median filter and a low Butterworth filter with a 20 Hz sample rate. This frequency is appropriate for detecting human body movements since 99% of its potential is confined to 15 Hz. The speed information, which comprises gravitational and body motion characteristics, was split using each Butterworth low-pass filtration system as body acceleration and gravity.
The fifth benchmark dataset, the MobiAct dataset [46], consists of tri-axial data for 15 activities of daily living (ADLs) and falls from 67 individuals, captured using a Samsung Galaxy S3. Designers examined a frame size of 5 s with a sampling frequency of 87 Hz. Moreover, the individual's sex, age, body weight, and size were mentioned. The device was randomly oriented within a flexible area selected by the individual. The sampling frequency was originally 87 Hz. Table 1 presents a comprehensive comparison of the five benchmark datasets.

Experimental Result and Evaluation
We evaluated the performance of a state-of-the-art random forest classifier by catering to the optimal selected features of different domains, including statistical, HHT, HWT, spectral entropy, and wavelet packet entropy descriptors via the PAMAP-2, MOTIONSENSE, UC HAR, MobiAct, and IM-WASHA benchmark databases. The experimental evaluation was conducted three-fold to assess the performance of the HPAR framework from three benchmark datasets. Figure 10a presents the confusion matrix for the IM-WSHA dataset for eleven daily living activities, where 90.18% of total accuracy was achieved. In the PAMAP-2 dataset, Figure 10b indicates a recognition rate of 91.25% from twelve physical activities. Regarding the MOTIONSENSE dataset, Figure 10c depicts an average accuracy of 92.16% from six static and dynamic activities, including walking, sitting, standing, jogging, upstairs, and downstairs. On the other hand, smartphone-based inertial sensor datasets, namely UCI-HAR and MobiAct, achieved significant results. Figure 10d shows that the confusion matrix UCI HAR of the dataset attained a significant mean accuracy of 91.83%. Figure 10e presents the confusion matrix of the MobiAct dataset, which achieved a 90.46% recognition rate.

Experimental Result and Evaluation
We evaluated the performance of a state-of-the-art random forest classifier by catering to the optimal selected features of different domains, including statistical, HHT, HWT, spectral entropy, and wavelet packet entropy descriptors via the PAMAP-2, MOTION-SENSE, UC HAR, MobiAct, and IM-WASHA benchmark databases. The experimental evaluation was conducted three-fold to assess the performance of the HPAR framework from three benchmark datasets. Figure 10a presents the confusion matrix for the IM-WSHA dataset for eleven daily living activities, where 90.18% of total accuracy was achieved. In the PAMAP-2 dataset, Figure 10b indicates a recognition rate of 91.25% from twelve physical activities. Regarding the MOTIONSENSE dataset, Figure 10c depicts an average accuracy of 92.16% from six static and dynamic activities, including walking, sitting, standing, jogging, upstairs, and downstairs. On the other hand, smartphone-based inertial sensor datasets, namely UCI-HAR and MobiAct, achieved significant results. Figure 10d shows that the confusion matrix UCI HAR of the dataset attained a significant mean accuracy of 91.83%. Figure 10e presents the confusion matrix of the MobiAct dataset, which achieved a 90.46% recognition rate.   In Tables 2-6, we present the HPAR system performance with two state-of-the-art techniques, the support vector machine (SVM) [47] and AdaBoost [48] classifiers, using accuracy and other performance metrics, such as accuracy, recall, precision, and F measures for all activity classes in five databases.
Similarly, in Table 7, we provide the Cohen's kappa and Matthews correlation coefficient from all datasets. Finally, in Table 8, we summarize the results of the comparison between the HPAR model and different state-of-the-art systems.  In Tables 2-6, we present the HPAR system performance with two state-of-the-art techniques, the support vector machine (SVM) [47] and AdaBoost [48] classifiers, using accuracy and other performance metrics, such as accuracy, recall, precision, and F measures for all activity classes in five databases.

Conclusions
In this study, we presented an HPAR system based on augmented feature descriptors, comprising four major domain features. These domains analyzed statistical descriptors, the Hilbert-Huang transform, the Haar wavelet transform, spectral entropy, and wavelet packet entropy descriptors. Additionally, these augmented-based descriptors optimized the performance of the proposed HPAR systems by assessing spatiotemporal moments and continuous motion patterns of human daily living activities. Furthermore, these descriptors were optimized via stochastic gradient descent (SGD) and were then catered to the random forest (RF) classifier for further classification. This work also compares the performance of the SGD-based random forest classifier with other state-of-the-art classifiers, such as support vector machine (SVM) and AdaBoost. Our system incorporates data processing methods, robust feature extraction methods, and classification algorithms that have the potential to outperform the other state-of-the-art recognition rates.